94 research outputs found

    Of bits and bugs

    Get PDF
    Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins

    HHrep: de novo protein repeat detection and the origin of TIM barrels

    Get PDF
    HHrep is a web server for the de novo identification of repeats in protein sequences, which is based on the pairwise comparison of profile hidden Markov models (HMMs). Its main strength is its sensitivity, allowing it to detect highly divergent repeat units in protein sequences whose repeats could as yet only be detected from their structures. Examples include sequences with β-propellor fold, ferredoxin-like fold, double psi barrels or (βα)(8) (TIM) barrels. We illustrate this with proteins from four superfamilies of TIM barrels by revealing a clear 4- and 8-fold symmetry, which we detect solely from their sequences. This symmetry might be the trace of an ancient origin through duplication of a βαβα or βα unit. HHrep can be accessed at

    REPPER—repeats and their periodicities in fibrous proteins

    Get PDF
    REPPER (REPeats and their PERiodicities) is an integrated server that detects and analyzes regions with short gapless repeats in protein sequences or alignments. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. Both programs use a sliding window to ensure that different periodic regions within the same protein are detected independently. FTwin and REPwin are complemented by secondary structure prediction (PSIPRED) and coiled coil prediction (COILS), making the server a versatile analysis tool for sequences of fibrous proteins. REPPER is available at

    The HHpred interactive server for protein homology detection and structure prediction

    Get PDF
    HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software from these alignments. A detailed help facility is available. As a demonstration, we analyze the sequence of SpoVT, a transcriptional regulator from Bacillus subtilis. HHpred can be accessed at

    TPRpred: a tool for prediction of TPR-, PPR- and SEL1-like repeats from protein sequences

    Get PDF
    BACKGROUND: Solenoid repeat proteins of the Tetratrico Peptide Repeat (TPR) family are involved as scaffolds in a broad range of protein-protein interactions. Several resources are available for the prediction of TPRs, however, they often fail to detect divergent repeat units. RESULTS: We have developed TPRpred, a profile-based method which uses a P-value-dependent score offset to include divergent repeat units and which exploits the tendency of repeats to occur in tandem. TPRpred detects not only TPR-like repeats, but also the related Pentatrico Peptide Repeats (PPRs) and SEL1-like repeats. The corresponding profiles were generated through iterative searches, by varying the threshold parameters for inclusion of repeat units into the profiles, and the best profiles were selected based on their performance on proteins of known structure. We benchmarked the performance of TPRpred in detecting TPR-containing proteins and in delineating the individual repeats therein, against currently available resources. CONCLUSION: TPRpred performs significantly better in detecting divergent repeats in TPR-containing proteins, and finds more individual repeats than the existing methods. The web server is available at , and the C++ and Perl sources of TPRpred along with the profiles can be downloaded from

    On the origin of the histone fold

    Get PDF
    BACKGROUND: Histones organize the genomic DNA of eukaryotes into chromatin. The four core histone subunits consist of two consecutive helix-strand-helix motifs and are interleaved into heterodimers with a unique fold. We have searched for the evolutionary origin of this fold using sequence and structure comparisons, based on the hypothesis that folded proteins evolved by combination of an ancestral set of peptides, the antecedent domain segments. RESULTS: Our results suggest that an antecedent domain segment, corresponding to one helix-strand-helix motif, gave rise divergently to the N-terminal substrate recognition domain of Clp/Hsp100 proteins and to the helical part of the extended ATPase domain found in AAA+ proteins. The histone fold arose subsequently from the latter through a 3D domain-swapping event. To our knowledge, this is the first example of a genetically fixed 3D domain swap that led to the emergence of a protein family with novel properties, establishing domain swapping as a mechanism for protein evolution. CONCLUSION: The helix-strand-helix motif common to these three folds provides support for our theory of an 'ancient peptide world' by demonstrating how an ancestral fragment can give rise to 3 different folds

    The MPI Bioinformatics Toolkit for protein sequence analysis

    Get PDF
    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at

    Tejaas: reverse regression increases power for detecting trans-eQTLs

    Get PDF
    Trans-acting expression quantitative trait loci (trans-eQTLs) account for ≥70% expression heritability and could therefore facilitate uncovering mechanisms underlying the origination of complex diseases. Identifying trans-eQTLs is challenging because of small effect sizes, tissue specificity, and a severe multiple-testing burden. Tejaas predicts trans-eQTLs by performing L2-regularized “reverse” multiple regression of each SNP on all genes, aggregating evidence from many small trans-effects while being unaffected by the strong expression correlations. Combined with a novel unsupervised k-nearest neighbor method to remove confounders, Tejaas predicts 18851 unique trans-eQTLs across 49 tissues from GTEx. They are enriched in open chromatin, enhancers, and other regulatory regions. Many overlap with disease-associated SNPs, pointing to tissue-specific transcriptional regulation mechanisms.Fil: Banerjee, Saikat. Max Planck Institute For Biophysical Chemistry; AlemaniaFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Max Planck Institute For Biophysical Chemistry; AlemaniaFil: Detrois, Kira E.. Max Planck Institute For Biophysical Chemistry; Alemania. Universität Göttingen; AlemaniaFil: Kaphle, Anubhav. Universität Göttingen; Alemania. Max Planck Institute For Biophysical Chemistry; AlemaniaFil: Mitra, Raktim. Indian Institute of Technology; IndiaFil: Nagial, Rahul. Indian Institute of Technology; IndiaFil: Söding, Johannes. Max Planck Institute For Biophysical Chemistry; Alemania. University of Göttingen; Alemani

    A Conserved GA Element in TATA-Less RNA Polymerase II Promoters

    Get PDF
    Initiation of RNA polymerase (Pol) II transcription requires assembly of the pre-initiation complex (PIC) at the promoter. In the classical view, PIC assembly starts with binding of the TATA box-binding protein (TBP) to the TATA box. However, a TATA box occurs in only 15% of promoters in the yeast Saccharomyces cerevisiae, posing the question how most yeast promoters nucleate PIC assembly. Here we show that one third of all yeast promoters contain a novel conserved DNA element, the GA element (GAE), that generally does not co-occur with the TATA box. The distance of the GAE to the transcription start site (TSS) resembles the distance of the TATA box to the TSS. The TATA-less TMT1 core promoter contains a GAE, recruits TBP, and supports formation of a TBP-TFIIB-DNA-complex. Mutation of the promoter region surrounding the GAE abolishes transcription in vivo and in vitro. A 32-nucleotide promoter region containing the GAE can functionally substitute for the TATA box in a TATA-containing promoter. This identifies the GAE as a conserved promoter element in TATA-less promoters
    corecore